{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BVp8vazXYOz-"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "DBaXaQ_PYT4p"
      },
      "outputs": [],
      "source": [
        "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XSkl5h3dZMo9"
      },
      "source": [
        "# PaliGemma2 - Run with Transformers.js\n",
        "\n",
        "Author: Sitam Meur\n",
        "\n",
        "*   GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n",
        "*   X: [@sitammeur](https://x.com/sitammeur)\n",
        "\n",
        "Description: This notebook demonstrates how you can run inference on PaliGemma2 model using Node.js and [Transformers.js](https://huggingface.co/docs/transformers.js/index). Transformers.js lets you run Hugging Face's transformer models directly in browser, offering a JavaScript API similar to Python's.  It supports NLP, computer vision, audio, and multimodal tasks using ONNX Runtime and allows easy conversion of PyTorch, TensorFlow, and JAX models.\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/PaliGemma/[PaliGemma_2]Using_with_Transformersjs.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ftkrrn3aZyAl"
      },
      "source": [
        "## Setup\n",
        "\n",
        "### Select the Colab runtime\n",
        "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the PaliGemma 2 model. In this case, you can use CPU runtime:\n",
        "\n",
        "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
        "2. Select **Change runtime type**.\n",
        "3. Under **Hardware accelerator**, select **CPU**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eCJ7yo3-Zzdj"
      },
      "source": [
        "## Installation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ET_KH77YZ5lc"
      },
      "source": [
        "Let's get started with installing the dependencies."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d7Ds4h2q0ItU"
      },
      "outputs": [],
      "source": [
        "# Install Node.js\n",
        "!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -\n",
        "!sudo apt-get install -y nodejs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ObI8d_Rwa_nn"
      },
      "source": [
        "## Create Node.js project\n",
        "\n",
        "Create a new Node.js project and install the required transformers package via [NPM](https://www.npmjs.com/package/@huggingface/transformers)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5nmPcg5J0cYj"
      },
      "outputs": [],
      "source": [
        "# Create project directory\n",
        "!mkdir paligemma2-node\n",
        "%cd paligemma2-node\n",
        "\n",
        "# Initialize NPM project\n",
        "!npm init -y\n",
        "!npm i @huggingface/transformers"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gGYVoOXA3T7-"
      },
      "outputs": [],
      "source": [
        "%%writefile package.json\n",
        "\n",
        "{\n",
        "  \"name\": \"paligemma2-node\",\n",
        "  \"version\": \"1.0.0\",\n",
        "  \"main\": \"index.js\",\n",
        "  \"type\": \"module\",\n",
        "  \"scripts\": {\n",
        "    \"test\": \"echo \\\"Error: no test specified\\\" && exit 1\"\n",
        "  },\n",
        "  \"keywords\": [],\n",
        "  \"author\": \"\",\n",
        "  \"license\": \"ISC\",\n",
        "  \"description\": \"\",\n",
        "  \"dependencies\": {\n",
        "    \"@huggingface/transformers\": \"^3.1.2\"\n",
        "  }\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "U9mJ6Pi3bxCY"
      },
      "source": [
        "## Transformers.js Inference\n",
        "\n",
        "Now, let's run inference on the PaliGemma2 model using Transformers.js. First, load the model and processor and then prepare inputs (Text query + Image) to run inference and get the output as desired image caption. For reference, you can check the model's page on the Hugging Face model hub under ONNX models section [here](https://huggingface.co/onnx-community/paligemma2-3b-pt-224)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8149e6d989e6"
      },
      "outputs": [],
      "source": [
        "# Show the image from the URL\n",
        "from PIL import Image\n",
        "import requests\n",
        "\n",
        "url = \"https://jethac.github.io/assets/juice.jpg\"\n",
        "img = Image.open(requests.get(url, stream=True).raw) \n",
        "img"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dbed89fc4a5f"
      },
      "source": [
        "It's an image of a cat sitting on a bag, now let's see what the model predicts."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-Gy2LdaY3Iqh"
      },
      "outputs": [],
      "source": [
        "%%writefile index.js\n",
        "\n",
        "// Import the required modules\n",
        "import {\n",
        "  AutoProcessor,\n",
        "  PaliGemmaForConditionalGeneration,\n",
        "  load_image,\n",
        "} from \"@huggingface/transformers\";\n",
        "\n",
        "// Load processor and model\n",
        "const model_id = \"onnx-community/paligemma2-3b-pt-224\"; // Change this to use a different PaliGemma model\n",
        "const processor = await AutoProcessor.from_pretrained(model_id);\n",
        "const model = await PaliGemmaForConditionalGeneration.from_pretrained(\n",
        "  model_id,\n",
        "  {\n",
        "    dtype: {\n",
        "      embed_tokens: \"q8\", // or 'fp16'\n",
        "      vision_encoder: \"q8\", // or 'q4', 'fp16'\n",
        "      decoder_model_merged: \"q4\", // or 'q4f16'\n",
        "    },\n",
        "  }\n",
        ");\n",
        "console.log(\"Model and processor loaded successfully!\");\n",
        "\n",
        "// Prepare inputs\n",
        "const url = \"https://jethac.github.io/assets/juice.jpg\";\n",
        "const raw_image = await load_image(url);\n",
        "const prompt = \"<image>\"; // Caption, by default\n",
        "const inputs = await processor(raw_image, prompt);\n",
        "console.log(\"Inputs prepared successfully!\");\n",
        "\n",
        "try {\n",
        "  // Generate a response\n",
        "  const response = await model.generate({\n",
        "    ...inputs,\n",
        "    max_new_tokens: 100, // Maximum number of tokens to generate\n",
        "  });\n",
        "\n",
        "  // Extract generated IDs from the response\n",
        "  const generatedIds = response.slice(null, [inputs.input_ids.dims[1], null]);\n",
        "\n",
        "  // Decode the generated IDs to get the answer\n",
        "  const decodedAnswer = processor.batch_decode(generatedIds, {\n",
        "    skip_special_tokens: true,\n",
        "  });\n",
        "\n",
        "  // Log the generated caption\n",
        "  console.log(\"Generated caption:\", decodedAnswer[0]);\n",
        "} catch (error) {\n",
        "  console.error(\"Error generating response:\", error);\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Zz81XOKebf_j"
      },
      "source": [
        "## Run Application"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PmVUVFDf3-yE"
      },
      "outputs": [],
      "source": [
        "# Run the node.js application\n",
        "!node index.js"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gFgOMvLmcVnY"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "Congratulations! You have successfully run inference on PaliGemma2 model using Transformers.js via Node.js environment. You can now integrate this into your projects."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "[PaliGemma_2]Using_with_Transformersjs.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
